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We formulate the head-to-head matchups between Major League Baseball pitchers and batters 
from 1954 to 2008 as a bipartite network of mutually-antagonistic interactions. We consider both 
the full network and single-season networks, which exhibit interesting structural changes over time. 
We find interesting structure in the network and examine their sensitivity to baseball's rule changes. 
We then study a biased random walk on the matchup networks as a simple and transparent way 
to compare the performance of players who competed under different conditions and to include 
information about which particular players a given player has faced. We find that a player's position 
in the network does not correlate with his success in the random walker ranking but instead has a 
substantial effect on its sensitivity to changes in his own aggregate performance. 
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Keywords: bipartite networks, ranking systems, random walkers, competition dynamics 



I. INTRODUCTION 

The study of networks has experienced enormous 
growth in recent years, providing foundational insights 
into numerous complex systems ranging from protein in- 
teraction networks in biology to online friendship net- 
works in the social sciences [2, H, H| . Research on ecolog- 
ical and organizational networks has provided a general 
framework to study the mechanisms that mediate the co- 
operation and competition dynamics between individuals 
PL IE S S H, Hi- In these networks, competitive interac- 
tions result from the indirect competition between mem- 
bers of different populations, who either compete for the 
same resources or are linked through consumer-resource 
relationships. However, data on mutually-antagonistic 
interactions — i.e., individuals who directly fight or com- 
pete against each other — have been more difficult to col- 
lect [ToL [lH . Mutually-antagonistic interactions also oc- 
cur frequently in different social contexts such as sports. 
In the present paper, we consider head-to-head matchups 
between Major League Baseball (MLB) pitchers and bat- 
ters: Pitchers benefit by "defeating" batters, and vice 
versa. Using data from |retrosheet . org| [3^ |. we charac- 
terize the more than eight million MLB plate appearances 
from 1954 to 2008, considering full careers by examining 
head-to-head matchups over a multi-season ("career") 
network and single-season performances by constructing 
networks for individual seasons. 

To compare the performance of players, MLB uses 
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votes by professional journalists to recognize career 
achievement of players through induction into a Hall 
of Fame (HOF) and single-season performance through 
awards such as Most Valuable Player (MVP) and Cy 
Young (for pitching performance) [l^]. Although the 
HOF purports to recognize the best players of all time, 
the selection of players to it is widely criticized by fans 
and pundits each year because of the lack of consistency 
when, e.g., comparing players from different eras, who 
play under fundamentally different conditions — in differ- 
ent ballparks, facing different players, etc. [H, [Hj]. Such 
arguments come to the fore when attempting to draw 
comparisons between players elected to the HOF and 
others who did not make it. For instance, how can one 
tell whether Jim Rice (elected to the HOF in 2009) had 
a better career than Albert Belle (who dropped off the 
ballot because of low vote totals after only two years 
(4fj|? Does Bert Blyleven, who appeared on 62.7% of 
the HOF ballots in 2009— short of the 75% required for 
election — belong in the HOF? Is Sandy Koufax, who 
played from 1955-1966 and is in the HOF, better than 
Pedro Martinez, who was still active during the 2008 sea- 
son and who will presumably eventually be elected to the 
HOF? To address such questions, it is insufficient to rely 
purely on raw statistics; one must also consider quanti- 
tative mechanisms for comparison between athletes who 
played under different conditions. We take a first, simple 
step in this direction through the study of biased ran- 
dom walkers on these graphs [HI, [l6| , allowing us to not 
only construct a quantitative, systematic, and transpar- 
ent ranking methodology across different eras, but also to 
investigate the interplay between these dynamics and the 
underlying graph structure and to reveal key properties 
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of mutually-antagonistic interactions that can potentially 
also be applied in other settings. 

While "water-cooler" discussions about the HOF can 
often be fascinating, as indicated by the above paragraph, 
we stress that the primary goal of our paper is to inves- 
tigate interesting features of the baseball networks and 
the impact that network structure can have on rankings 
rather than on the rankings themselves. While it is nec- 
essary to include some example rank orderings for the 
purpose of such a discussion, it is important to note that 
the rankings we show in the present paper must be taken 
with several grains of salt because our efforts at simplic- 
ity, which are crucial to highlighting the interplay be- 
tween network structure and player rankings, require us 
to ignore essential contributing factors (some of which 
we will briefly discuss) that are necessary for any serious 
ranking of baseball players. 

The rest of this paper is organized as follows. In 
Section HH we define and characterize the mutually- 
antagonstic baseball networks and study the time evo- 
lution of various graph properties. In Section IIII| we 
provide a description of the biased random walker dy- 
namics that we employ as a ranking methodology across 
eras and for single-season networks. In Section IIV1 we 
study the interplay between the random walker dynamics 
and graph structure, paying special attention to the sen- 
sitivity of the player rankings. In Section [Vj we conclude 
the paper and discuss a number of potential applications 
of our work. We explain additional technical details in 
two appendices. 



II. NETWORK CHARACTERIZATION AND 
EVOLUTION 

We analyze baseball's mutually-antagonistic ecology 
by considering bipartite (two-mode) networks of head-to- 
head matchups between pitchers and batters. As shown 
in Fig. [TJ bipartite networks are formed using two dis- 
joint sets of vertices, P (pitchers) and B (batters), and 
the requirement that every edge connect a vertex in P 
to one in B d, [l?], [H| (keeping the pitching and bat- 
ting performances of pitchers as two separate nodes). 
We consider such interactions in terms of three different 
bipartite representations (with corresponding matrices): 
(1) The binary matchups A in which the element Aij 
equals 1 if pitcher i faced batter j at any point and 
otherwise; (2) the weighted matchups W in which the 
element Wij equals the number of times that i faced j; 
and (3) the weighted outcomes M in which the element 
Mij equals a "score" or performance index, which in the 
case of picther-batter matchups is determined using what 
are known in baseball as "sabermetric" statistics (see the 
discussion below) [l3|, [3, E^l , characterizing the results 
of all matchups between i and j. For each of these bipar- 
tite pitcher-batter networks, we also utilize correspond- 



ing square adjacency matrices: 

so that they are appropriately symmetric (A and W) and 
anti-symmetric (M). We construct and analyze each of 
these representations for the single-season networks and 
the aggregate (career) network that contains all pitcher- 
batter interactions between 1954 and 2008. 

To identify the changes in the organization of base- 
ball networks, we examine the graph properties of single- 
season networks. The number of distinct opponents 
per playerj given by the distribution of player degree 
ki = , follows an exponential distribution for a 

large range and then has an even faster decay in the tail 
(see Fig. ^ . The mean values of the geodesic path length 
between nodes and of the bipartite clustering coefficient 
are only somewhat larger than what would be generated 
by random assemblages (see Appendix [A"| . However, as 
with mutually-beneficial interactions in ecological net- 
works [20], the mutually-antagonistic baseball matchup 
networks exhibit non-trivial relationships between player 
degree and player strength Sj = J2j , which repre- 
sents the total number of opponents of a player (counting 
multiplicity) [l|, [l7[ • As shown in Fig. the relation 
between strength and degree is closely approximated by 
a power law s ~ k a that starts in 1954 at a « 1.64 for 
pitchers and a ss 1.41 for batters but approaches awl 
for each by 2008. The six-decade trend of a decreasing 
power-law exponent indicates how real-life events such 
as the increase in the number of baseball teams through 
league expansion (e.g., in the 1960s, 1977, 1993, and 
1998), reorganization (e.g., in 1994, to three divisions in 
each league instead of two), interleague play (in 1997), 
and unbalanced schedules (in 2001) have modified the 
organization of the networks. 

An important property mediating the competition dy- 
namics of mutualistic networks in ecology is nestedness 
Although the definition of nestedness may vary, a 
network is said to be nested when low-degree nodes inter- 
act with proper subsets of the interactions of high-degree 
nodes [18| (see Fig.Q}. To calculate the aggregate nested- 
ness in the binary matchup network A, we employed the 
nestedness metric based on overlap and decreasing fill 
(NODF) [2l|, which takes values between [0,1], where 
1 designates a perfectly-nested network (see Appendix 
|A"|) . Figure [3)3 (black circles) shows that single-season 
baseball networks consistently have nestedness values of 
approximately 0.28. This value is slightly but consis- 
tently higher than those in randomized versions of the 
networks with similar distribution of interactions (red 
squares) [lH, which we also observe to decrease slightly 
in time. In common with bipartite cooperative networks, 
this confirms that nestedness is a significant feature of 
these mutually- antagonistic networks. 

Although nestedness is defined as a global character- 
istic of the network, we can also calculate the individ- 
ual contribution of each node to the overall nestedness 
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[211] . Comparing node degrees and individual nestedness 
(see Appendix |A"]) before 1973, batters and pitchers col- 
lapse well onto separate curves (see Fig. [3p). Starting in 
1973, however, each of these split into two curves (see 
Fig. [3jD), corresponding to players in the two different 
leagues: the American League (AL) and the National 
League (NL). This structural change presumably resulted 
from the AL's 1973 introduction of the designated hitter 
(DH) , a batter who never fields but bats in place of the 
team's pitchers (see Fig.[T|), apparently causing the AL to 
become less nested due to the replacement of low-degree 
batting pitchers with higher-degree DHs. This suggests, 
as we discuss below, that the network position of a player 
might affect his own ranking (while, of course, network 
position is strongly influenced by a player's longevity and, 
thus, by his performance). 



III. BIASED RANDOM WALKERS 

To compare the performance of players, we rank play- 
ers by analyzing biased random walkers on the bipar- 
tite network M encoding the outcomes of all mutually- 
antagonistic interactions between each player pair. Our 
method generalizes the technique we previously used for 
NCAA football teams [H, [l6| , allowing us to rank play- 
ers in individual seasons and in the 1954-2008 career net- 
work, yielding a quantitative, conceptually-clear method 
for ranking baseball players that takes a rather different 
approach from other sabermetric methods used to project 
player performance such as DiamondMind (which uses 
Monte Carlo simulations), PECOTA (which uses histor- 
ical players as a benchmark), and CHONE (which uses 
regression models) [22|, [23| . 

To describe the aggregate interaction My between 
pitcher i and batter j, we need to quantify each possible 
individual pitcher-batter outcome. For simplicity, we fo- 
cus on the quantity runs to end of inning (RUE) [Til ], 
which assigns a value to each possible plate appearance 
outcome (single, home run, strikeout, etc.) based on the 
expected number of runs that a team would obtain be- 
fore the end of that inning, independent of the situational 
context (see Appendix [5] for specific values). Higher 
numbers indicate larger degrees of success for the batting 
team. For each season, we add the RUE from each plate 
appearance of pitcher i versus batter j to obtain a cumu- 
lative RUE for the pair. Note that any performance index 
that assigns a value to a specific mutually-antagonistic 
interaction can be used in place of RUE without chang- 
ing the rest of our ranking algorithm. We then define the 
single-season outcome element My by the cumulative ex- 
tent to which the batter's outcome is better (My > 0) or 
worse (My < 0) than the mean outcome over all pitcher- 
batter matchups that season. When defining the career 
outcome element My for 1954-2008, we account for base- 
ball's modern era offensive inflation [l3l . ll4T ] by summing 
over individual seasons (i.e., relative to mean outcomes 
on a per season basis). 



We initiate our ranking methodology by considering 
independent random walkers who each cast a single vote 
for the player that they believe is the best. Each walker 
occasionally changes its vote with a probability deter- 
mined by considering the aggregate outcome of a single 
pitcher-batter pairing, selected randomly from those in- 
volving their favorite player, and by a parameter quanti- 
fying the bias of the walker to move towards the winner of 
the accumulated outcome. A random walker that is con- 
sidering the outcome described by this matchup is biased 
towards but not required to choose the pitcher (batter) 
as the better player if My < (My > 0). 

The expected rate of change of the number of votes 
cast for each player in the random walk is quantified by a 
homogeneous system of linear differential equations v' = 
D • v, where 

fe+rM^ i^j 
\-Si+r^2 k M ik , i = j. y 1 

The long-time average fraction of walkers vj residing at 
(i.e., voting for) player j is then found by solving the 
linear algebraic system D-v = 0, subject to an additional 
constraint that Y]j Vj = 1. If the bias parameter r > 0, 
successful players will on average be ranked more highly. 
For r < 0, the random walker votes will instead tend 
toward the "loser" of individual matchups. 

Equation |l| gives a general one-parameter system for 
a biased walker with probabilities that are linear in RUE, 
but the approach is easily generalized by using other 
functional forms to map observed plate appearance out- 
comes (in M) into selection probabilities. By restricting 
our attention to a form that is linear in RUE, the in- 
terpretation that the off-diagonal components of D cor- 
respond to random walker rate coefficients requires that 
these components remain non-negative, a preferable state 
that leads to a number of beneficial properties in the 
resulting matrix. For example, this allows us to apply 
the Perron-Frobenius theorem, which guarantees the ex- 
istence of an equilibrium v with strictly positive entries 
(and similarly guarantees the existence of positive solu- 
tions in algorithms such as PageRank) [l6|, [l?], H3, HH • 
In practice, this requirement is equivalent in the base- 
ball networks to \r\ < 0.7, so that the result of a home 
run in a single plate-appearance matchup (i.e., the case 
in which a batter faces a pitcher exactly once and hits 
a home run in that appearance) maintains a small but 
non-negative chance that a corresponding random walker 
will still select the pitcher. 

However, because the aggregate outcome of most pair- 
ings remains close to the mean, the bias in the random 
walk is small, and the rankings become essentially inde- 
pendent of the bias parameter. The linear expansion in 
bias r thereby yields a ranking with no remaining param- 
eters beyond the statistically-selected RUE values, given 
by v = v(°) + rV + 0(r 2 ) . Generalizing the similar ex- 
pansion described in detail in Ref. 0, the zeroth-order 
term results in a constant number of votes per player, 
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and the additional contribution at first order is given by 
the solution of a discrete Poisson equation on the graph: 

3 i 

subject to the neutral charge constraint J2j Vj — 0- (By 
analogy with electrostatics, we refer to Vj as the RUE 
'charge' of node j.) In equation ((2|), n = P+B is the total 
number of players, L = S — W is the graph Laplacian, S 
is the diagonal matrix with elements su = . Wy (and 
Sij = for i ^ j). Accordingly, we restrict our attention 
to the first-order ranking specified by V and obtained 
using the solution of equation @ . 

We tabulate this rank ordering separately for pitch- 
ers and batters, for individual seasons and the full career 
network. We compare the results of the random walker 
ranking to major baseball awards in Table |TJ We note 
that the rankings are highly correlated with the under- 
lying RUE per plate appearance of each player (r « .96 
for 2008 and similar for other seasons), so that the top 
players in the rankings produced by our method have a 
strong but imperfect correlation with the lists produced 
by ranking players according to raw RUE values. (This 
similarly holds for any other sabermetric quantity that 
one might use in place of RUE.) That is, it matters 
which players one has faced, and that is codified by the 
network. We note, for example, that the differences be- 
tween random walker rankings and raw RUE rankings 
appear to appropriately capture the caliber of opponents 
(for example, pitchers from teams with relatively ane- 
mic offenses — such as the 2008 Nationals, Astros, and 
Reds — have a higher ranking in the random walker rank- 
ing, reflecting that they never had the good fortune of 
going up against their own teams' batters). We also 
compared the rankings with a leading metric in baseball 
analysis, ESPN's MLB Player Ratings, which combines 
ratings from ESPN, Elias, Inside Edge, and The Baseball 
Encyclopedia [26j . Of the top 99 players for 2008 who are 
listed in the Player Ratings, 12 did not meet our thresh- 
old for plate appearances. Comparing the random walker 
results for the remaining 87 players with the Player Rat- 
ings yields a r « .5601 correlation. We thus proceed to 
study the random walker results for the full career rank- 
ing both with confidence that it correlates with methods 
currently used for single-season analysis and caution that 
the ranking details do not capture all effects according to 
current best practices in quantitative baseball analysis 

The full career ranking allows credible comparisons be- 
tween players from different eras. Interestingly, consider- 
ing the rankings restricted to individuals who played in 
at least 10 seasons during this time (HOF-eligible play- 
ers), we find that Barry Bonds (batter), Pedro Martinez 
(pitcher), and Mariano Rivera (relief pitcher) are the best 
players (in these categories) from 1954 to 2008. We show 
additional rankings in Table [TTJ We especially note that 
Albert Belle (29th among batters) is ranked much higher 



than Jim Rice (115th), suggesting that Belle's hitting 
performance perhaps merits HOF membership more than 
that of Rice. Similarly, Bert Blyleven ranks higher not 
only than current HOF competitors such as Jack Morris 
and Tommy John but also higher than three HOF pitch- 
ers with over 300 wins (Steve Carlton, Phil Niekro, and 
Don Sutton), which is one traditional benchmark for se- 
lecting elite pitchers. Direct comparison with other rank 
orderings of players across different eras would necessi- 
tate restriction to sufficiently similar time periods and 
is thus beyond the network-science focus of the present 
study. 



IV. LINKING STRUCTURE TO 
PERFORMANCE 

As previously suggested, the network architecture 
should have important effects on the performance of 
players. In particular, central players in the network 
might have a systematic advantage in the rankings rel- 
ative to those who are not as well connected. Such 
structurally-important players (see Table |TT] for exam- 
ples), who have high values for both betweenness cen- 
trality and nestedness, have had long — and usually ex- 
tremely successful — careers, so it is of significant interest, 
yet difficult, to gauge the coupled effects on their rank 
ordering from statistical success versus structural role in 
the network. In fact, we found no correlation (r « 0.001) 
between a player's position — i.e., individual nestedness 
and betweenness — and his success measured by the frac- 
tion of votes received. 

Hence, we investigate this connection further via the 
correlation between the sensitivity of rankings to changes 
in outcomes in individual pitcher-batter pairs, which is 
formulated using the Moore-Penrose pseudo-inverse of 
the graph Laplacian. Consider changing the outcome 
of the single edge that corresponds to the aggregate 
matchup between players i and j. If we increase the 
former's aggregate RUE by a unit amount at the ex- 
pense of the latter, then the total RMS change in votes 
V is proportional to the difference between the ith and 
jth columns of L + . This difference yields a node-centric 
measure of the sensitivities of rankings to individual per- 
formances: the constraint Lf^ = yields that (the 
diagonal element of the graph Laplacian pseudo-inverse), 
the direct control that player i has over his own ranking, 
is equal and opposite to the total change his performance 
directly imposes on the rest of the network. Additionally, 
as illustrated in Fig. |3]A., the quantity is closely re- 
lated to the RMS changes in votes across the network 
due to the performance of player i. 

Noting that the element is related to the mean 
of the commute times between nodes i and j (averag- 
ing over all j) (28[, specifically under our constraints, the 
sum of the commute times = + — 2L+- over 
j yields a linear function of L J . Consequently, pro- 
vides a node-based measure of the average distance from 
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node i to the rest of the network. This definition of av- 
erage commute time has similarities with the measures 
known as information centrality [29] and random walk 
centrality [30;] (though the results of applying the differ- 
ent measures can still be quite different). The negative 
relationship between and both betweenness central- 
ity and nestedness, which we show in Fig. thus yields 
a corresponding negative relationship between the mean 
commute distance and the betweenness and nestedness 
of a player. A player who is highly embedded in the 
network (i.e., one with high individual nestedness) has a 
small mean commute distance to the rest of the network, 
and the ranking of that player is not very sensitive to 
the outcome of a single matchup. In contrast, a player 
who is in the periphery of the network (i.e., one with low 
individual nestedness) typically has a very large mean 
commute distance to other portions of the graph, and 
his place in the ranking-ordering is consequently much 
more sensitive to the results of his individual matchups 
fill ]. This would suggest that players in the AL tend on 
average to be more prone to changes in their own rank- 
ings than players in the NL (see Fig. 03). 

Remarkably, we can make these general notions much 
more precise, as L\ ks , where we recall that Si is 
the strength of node i (see Fig. HJ3). Some similarities 
between these quantities is reasonably expected (cf. the 
role of relaxation times in a similar relationship with ran- 
dom walk centrality in Ref. [30j , which can be quantified 
by an eigenvalue analysis). This simple relationship be- 
lies a stunning organizational principle of this network: 
The global quantity of average commute time of a node is 
well-approximated by its strength, a simple local quan- 
tity. That is, in the appropriate perturbation analysis 
to approximate the Laplacian pseudo-inverse, the higher 
order terms essentially cancel out, contributing little be- 
yond the (zeroth-order) local contribution. We also found 
a rougher relationship for nestedness and betweenness 
(see Fig. [6]). 

These results have two interesting implications. First, 
they reveal that the success of well-connected players de- 
pends fundamentally on a strong aggregate performance 
rather than just on their position in the network. Sec- 
ond, they imply that neophyte players would need to 
face well-connected players if they want to establish a 
stronger connection to the network and a ranking that is 
less vulnerable to single matchups. Similarly, recent re- 
search on mutualistic networks in ecology has found that 
neophyte species experience lower competition pressures 
by linking to well-connected species Our findings on 
baseball-player rankings suggest the possibility of finding 
similar competition patterns in mutually-antagonistic in- 
teractions in ecological and social networks. 



V. CONCLUSIONS 

Drawing on ideas from network science and ecology, 
we have analyzed the structure and time-evolution of 



mutually-antagonistic interaction networks in baseball. 
We considered a simple ranking system based on biased 
random walks on the graphs and used it to compare 
player performance in individual seasons and across en- 
tire careers. We emphasize that our ranking methodology 
is overly simplistic, having noted several considerations 
that one might use to improve it (see, e.g., Appendix B) 
while maintaining a network framework that accounts for 
which players each player has faced. We also examined 
how the player rankings and their sensitivities depend on 
node-centric network characteristics. 

We expect that similar considerations might be useful 
for developing a better understanding of the interplay be- 
tween structure and function in a broad class of compet- 
itive networks, such as those formed by antigen-antibody 
interactions, species competition for resources, and com- 
pany competition for consumers. Given the motivation 
from ecology, we are optimistic that this might lead to 
interesting ecological insights, compensating for the dif- 
ficulty in collecting data on the regulatory dynamics of 
mutually-antagonistic networks in ecology — such as the 
ones formed by parasites and free-living species [Til ] — or 
assessing the potential performance of invasive species 
from different environments [3l|. 
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APPENDIX A: QUANTITIES FOR BIPARTITE 
NETWORKS 

Here, we review some important quantities for bipar- 
tite networks and discuss their values for the baseball 
matchup networks. 

A cluste ring coefficient for bipartite networks can be 
defined by [321 ] 
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where qi mn is the number of complete squares involving nestedness is given by [2l| 



nodes i, 



and 



1 



1i; 



enforces the require- 



ment in bipartite graphs that there are no links between 
nodes of the same population; and we recall that ki is 
the degree of node i. Hence, the numerator in (|A1|) gives 
the actual number of squares and the denominator gives 
the maximum possible number of possible squares. For 
the single-sason baseball networks, we calculate the ratio 
r c = (C4) / {Ci r ) between the mean clustering coefficient 
(C4} summed over all nodes i and the mean clustering 
coefficient (C± r ) generated by a randomization of the net- 
work that preserves the original degree distribution [33| . 
We found that baseball networks have average clustering 
coefficients that are just above that of random networks. 
Interestingly, the ratio r c decreases gradually (and al- 
most monotonically from one season to the next) from 
r c w 2.5 in 1954 to r c w 1.3 in 2008. 

The geodesic betweenness centrality of nodes over the 
unweighted network A is defined by 



;y 01 nc 



Aj,fc(i) 



(A2) 



where Aj^(i) is the number of shortest paths between 
players j and k that pass through player i and dj t k is 
the total number of shortest paths between players j and 
k. For the single-season baseball networks, we calculate 
the ratio n, = {b)/{b r } between the mean path length 
(b) summed over all nodes i and the mean path length 
(b r ) generated by a randomization of the network that 
preserves the degree distribution (33|. As with cluster- 
ing coefficients, we found that the mean path lengths of 
baseball networks are only slightly larger than those of 
random networks, finding in particular that rj, G (1,3) 

Nestedness is an important concept that has been ap- 
plied to ecological communities, in which species present 
in sites with low biodiversity are also present in sites 
with high biodiversity [36]. Although the general no- 
tion of nestedness may vary, the concept has nonethe- 
less been employed quite successfully in the analysis of 
ecological networks [18]. In a nested network, interac- 
tions between two classes of nodes (e.g., plants and ani- 
mals) are arranged so that low-degree nodes interact with 
proper subsets of the interactions of high-degree nodes. 
A nested network contains not only a core of high-degree 
nodes that interact with each other but also an impor- 
tant set of asymmetric links (i.e., connections between 
high-degree and low-degree nodes). The importance of 
nestedness measures is twofold: (1) they give a sense of 
network organization; and (2) they have significant im- 
plications for the stability and robustness of ecological 
networks HlHj]. 

To avoid biases in nestedness based on network size 
(i.e., the number of nodes), degree distribution, and other 
structural properties, we employ the nestedness calcu- 
lations introduced recently in Ref. [l8j ]. The aggregate 



NODF 



([P(P-l)/2} + [B(B-l)/2]) 



(A3) 



For every pair of pitchers (i and j), the quantity N% j is 
equal to if ki < kj and is equal to the fraction of com- 
mon opponents if ki > kj . We also define a similar quan- 
tity for every pair of batters (/ and m). The nestedness 
metric takes values between [0, 1], where 1 designates a 
perfectly-nested network and indicates a network with 
no nestedness. 

The NODF nestedness also allows us to calculate the 
individual nestedness of each pitcher (column) or batter 
(row) using the equation 



*(i) = X>w/(T-l). 



(A4) 



where T = P (total number of columns) for pitchers, 
T = B (total number of rows) for batters and Nij is cal- 
culated as above. In this way, the individual nestedness 
metric takes values between [0, 1], where 1 designates a 
perfectly-nested individual and indicates an individual 
with no nestedness. 

The null model used to compare the empirical nested- 
ness is given by [HI 



2B 



^3 

2P ' 



(A5) 



where qi_j is the occupation probability of a pairwise in- 
teraction between node i and node j, and we recall that 
B and P are, respectively, the total number of nodes j 
(batters) and nodes i (pitchers) in the network. In a bi- 
partite network, j and i represent two different types of 
nodes, so Qij is the mean of the occupation probabilities 
of the row and column. Recent studies have shown that 
model-generated nestedness values extracted from this 
null model lower the probability of incorrectly determin- 
ing an empirical nested structure to be significant [2l| . 
For baseball networks, we calculated the standard error — 
given by Z = (NODF - (NODF)) /a, where NODF 
corresponds to the nestedness values of the empirical net- 
works and (NODF) and a are, respectively, the average 
and standard deviations of nestedness values of random 
replicates generated by the null model. For the base- 
ball networks, we found that Z > 3 for all seasons (see 

Fig. EEs). 



APPENDIX B: DEFINITION OF RUNS TO END 
OF INNING (RUE) 

To quantify the outcome of each plate appearance, 
we used the sabermetric quantity runs to end of inning 
(RUE) 14], which assigns a value to each of the possible 
outcomes in a plate appearance based on the expected 
number of runs a team would obtain before the end of 
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that inning following that event, independent of game 
context. (RUE can also be adjusted by subtracting the 
initial run state |27j|.) Higher numbers indicate larger de- 
grees of success for the batting team. The batter events 
(and their associated numerical RUE values) are generic 
out (0.240), strikeout (0.207), walk (0.845), hit by pitch 
(0.969), interference (1.132), fielder's choice (0.240), sin- 
gle (1.025), double (1.311), triple (1.616), and home run 
(1.942). 

Note that we are ignoring events such as passed balls 
and stolen bases that can occur in addition to the above 
outcomes in a given plate appearance. This might lead to 
some undervaluing in the ranking for a small number of 
position players (such as Tim Raines) that rely on stolen 
bases. We also considered the metric known as weighted 
on base average (wOBA) [37|, and note that any metric 
that assigns a value to a specific plate appearance can be 
used in place of RUE without changing the rest of our 
ranking algorithm. This includes, in particular, popular 
sabermetric quantities such as win shares and value over 
replacement player (VORP) [HB3. 

One can also incor- 
porate ideas like ballpark effects into the metric employed 
at this stage of the algorithm without changing any other 
part of the method. Although it would make the method- 
ology more complicated (in contrast to our goals), it is 
also possible to generalize the algorithm to include more 
subtle effects such as estimates for when player perfor- 
mance peaks and how it declines over a long career. Some 
of the active players in the data set have not yet entered 



a declining phase in their careers and might have higher 
rankings now than they will when their careers are over. 
We expect that the relatively high rankings of modern 
players versus ones who retired long ago might also result 
in part from the increased performance discrepancy be- 
tween the top players and average players in the present 
era versus what used to be the case and in part from 
performing well against the larger number of relati vely 
poor players occupying rosters because of expansion [38| . 
Finally, we note that batter-pitcher matchups are not 
fully random but contain significant correlations (e.g., 
in a given baseball game, the entire lineup of one team 
has plate appearances against the other team's starting 
pitcher) that can be incorporated to generalize the ran- 
dom walker process itself |27j |. 

To include the outcome of players who did not have 
many plate appearances without skewing their rankings 
via small samples, we separately accumulate the results 
for all pitchers and batters with fewer than some thresh- 
old number of plate appearances K into a single "re- 
placement pitcher" and "replacement batter" to repre- 
sent these less prominent players. In the results pre- 
sented in this paper, we used the threshold K = 500 
both for single seasons and for the collective outcome ma- 
trix. Note that similar thresholds exist when determin- 
ing single-season leadership in quantities such as batting 
average (which requires 3.1 plate appearances per team 
game, yielding 502 in a 162-game season) and earned run 
average (1 inning per team game). 
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TABLE I: Single-Season Awards and Random Walker Rankings. We show the MVP and CY Young award winners for various 
years from 1954 to 2008. In parentheses, we give the rank-order of the player within his own category (pitcher or batter) that 
we obtained using our random walker ranking system applied to the corresponding baseball season. For most of the seasons, 
there is good agreement between award winners and their random walker ranking. (Note that the Cy Young award was awarded 
to a single pitcher — rather than one from each league — until 1967.) 



1954 


1958 


1963 


MVP (AL) Yogi Bcrra (11th) 


Jackie Jensen (8th) 


Elston Howard (20th) 


MVP (NL) Willie Mays (2nd) 
Cy Young (AL) N/A 
Cy Young (NL) N/A 


Ernie Banks (6th) 
Bob Turley (14th) 
Bob Turley (14th) 


Sandy Koufax (1st) 
Sandy Koufax (1st) 
Sandy Koufax (1st) 


1968 


1973 


1978 


MVP (AL) Denny McLain (4th) 
MVP (NL) Bob Gibson (1st) 
Cy Young (AL) Denny McLain (4th) 
Cy Young (NL) Bob Gibson (1st) 


Reggie Jackson (11th) Jim Rice (3rd) 
Pete Rose (6th) Dave Parker (1st) 
Jim Palmer (13th) Ron Guidry (1st) 
Tom Seavcr (1st) Gaylord Perry (30th) 


1983 


1988 


1993 


MVP (AL) Cal Ripken Jr. (11th) Jose Canseco (3rd) 
MVP (NL) Dale Murphy (3rd) Kirk Gibson (17th) 
Cy Young (AL) LaMarr Hoyt (21st) Frank Viola (24th) 
Cy Young (NL) John Denny (14th) Orel Hershiser (7th) 


Frank Thomas (3rd) 
Barry Bonds (1st) 
Jack McDowell (17th) 
Greg Maddux (3rd) 


1998 


2003 


2008 


MVP (AL) Juan Gonzalez (18th) 
MVP (NL) Sammy Sosa (7th) 
Cy Young (AL) Roger Clemens (3rd) 
Cy Young (NL) Tom Glavine (10th) 


Alex Rodriguez (7th) 
Barry Bonds (1st) 
Roy Halladay (15th) 
Eric Gagne (8th) 


Dustin Pcdroia (23rd) 
Albert Pujols (1st) 
Cliff Lee (8th) 
Tim Lincecum (1st) 



Btw(P) 
Nolan Ryan 
Jim Kaat 
Tommy John 
Dennis Eckersley 
Jamie Moyer 
Greg Maddux 
Charlie Hough 
Don Sutton 
Phil Niekro 
Roger Clemens 



N(P) 

Jamie Moyer 
Roger Clemens 
Greg Maddux 
Mike Morgan 
Randy Johnson 
David Wells 
Kenny Rogers 
Terry Mulholland 
Jose Mesa 
Tom Glavine 



R(RP) 

Mariano Rivera 
Billy Wagner 
Troy Percival 
Trevor Hoffman 
Tom Henke 
B. J. Ryan 
Armando Benitez 
John Wcttcland 
Keith Foulke 
Rob Nen 



R(SP) 
Pedro Martinez 
Roger Clemens 
Roy Halladay 
Curt Schilling 
Sandy Koufax 
Randy Johnson 
John Smoltz 
Mike Mussina 
J. R. Richard 
Greg Maddux 



Btw(B) 

Julio Franco 
Rickey Henderson 
Carl Yastrzcmski 
Hank Aaron 
Pete Rose 
Tony Perez 
Joe Morgan 
Dave Winfield 
Ken Griffey Jr. 
Al Kaline 



N(B) 

Rickey Henderson 
Barry Bonds 
Steve Finley 
Craig Biggio 
Gary Sheffield 
Ken Griffey Jr. 
Luis Gonzalez 
Julio Franco 
Jeff Kent 
Omar Vizqucl 



R(B) 

Barry Bonds 
Todd Helton 
Mickey Mantle 
Manny Ramirez 
Frank Thomas 
Willie Mays 
Mark McGwire 
Alex Rodriguez 
Larry Walker 
Vladimir Guerrero 



TABLE II: Player Rankings. Top 10 pitchers (P) and batters (B) according to geodesic node betweenness (Btw), nestedness 
(N), and random walker ranking (R). Pitchers are divided into relief pitchers (RP) and starting pitchers (SP). In accordance 
with HOF eligibility, this table only includes players who played at least 10 seasons between 1954 and 2008. Note that if we 
consider all players with careers of at least 10 seasons, no matter how many of those seasons occurred between 1954 and 2008, 
the only change is that Ted Williams becomes the highest-ranking batter. If we consider all players with at least 8 seasons, the 
only additional change is that Albert Pujols is ranked just behind Barry Bonds. 
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Greg Maddux 



Mike LaCross' 



Steve Bedrosian 



Bert Blyleven 



Richard Dotson 



Paul Kilgus 



Jeff Innis V 
Rich Thomson 

Jerry Don Gleaton 
Jeff Kaiser • 




Mookie Wilson 
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Rickey Henderson 
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£ Ken Griffey Jr. 
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FIG. 1: Bipartite Baseball Networks. (A) A subset of the bipartite interactions between pitchers (left column) and batters 
(right column) during the 1989 baseball season. The area of each circle is determined by the node degree (i.e., how many 
different opponents were faced). Each line indicates that a given pitcher faced a given batter, and the darkness of each line is 
proportional to the number of plate appearances that occurred (i.e., the node strength). (B) The matrix encoding the complete 
set of bipartite interactions from 1989, with pitchers (columns) and batters (rows) arranged from the lowest to the highest 
node degree. An element of the matrix is black if that particular pitcher and batter faced each other and white if they did not. 
Observe the presence of a core of high-degree players that are heavily connected to each other (top right corner), an important 
presence of asymmetric interactions (i.e., high-degree players connected to low-degree players), and a dearth of connections 
between low-degree players (bottom left corner), which are all characteristics of nested networks [l^]. Some of the batters are 
actually pitchers (e.g., Mitch Williams), as National League pitchers (and, since 1997, also American League pitchers) have a 
chance to bat and face a small number of pitchers while at the plate. 
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FIG. 2: [Color online] Cumulative Degree Distribution. Semi- log plot of the cumulative degree distribution P cum {k) for pitchers 
and batters in the career (1954-2008) network. The empirical data (dots) are arranged in logarithmic bins. 
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FIG. 3: [Color online] Time Evolution and Summary Statistics of the Baseball Networks. Panel A shows the relation between 
player degree k and player strength s from 1954 to 2008. The vertical axis gives the value of the exponent a in the power-law 
relationship s ~ k a (see the discussion in the main text), where we observe that a tends to decrease as a function of time. 
Shuffling the strengths in the network while keeping the player degrees fixed yields a power-law relationship with a ~ 1 for all 
years. Blue circles denote pitchers and gray crosses denote batters. Each error bar corresponds to one standard deviation. The 
inset shows on a log-log scale the relationship between degree k and strength s for the 2008 season. Panel B shows the time 
evolution of the network's nestedness (which we defined using the NODF metric |2llp. Black circles and red squares represent, 
respectively, the values for the original data and the standard null model II [18|] • Each error bar again corresponds to one 
standard deviation. Panels C and D show, respectively, the relationship between node degree and individual nestedness for the 
1972 and 1973 networks. For comparison purposes, the degree of pitchers and batters are respectively scaled by a multiplicative 
factor of P/l and B/l, where P is the number of pitchers, B is the number of batters, and / is the number of undirected edges 
in the network. In 1973, the American League introduced the designated hitter rule, which caused a significant change in the 
structure of subsequent networks. Between 1954 and 1972, pitchers and batters each collapse onto a single curve. From 1973 to 
2008, however, pitchers and batters each yield two distinct curves, revealing a division between the American league (bottom) 
and National League (top). 
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FIG. 4: [Color online] Network Quantities versus Graph Laplacian. We plot the diagonal elements of the Moore-Penrose 
pseudo-inverse of the graph Laplacian of the network Lf t versus (A) the root mean squared change of votes across the network 
due to the RUE 'charge' at each node and (B) node strength. In each case, we use logarithmic coordinates on both axes. We 
note in particular the L\ ~ s^ 1 relationship in panel B. 




FIG. 5: [Color online] Betweenness and Nestedness versus Graph Laplacian. We plot the diagonal elements of the Moore-Penrose 
pseudo-inverse of the graph Laplacian of the network versus (A) node betweenness and (B) individual nestedness. 
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FIG. 6: [Color online] Degree, Strength, Betweenness, and Nestedness. We show a log-log plot of (A) player degree k versus 
betweenness centrality and (B) degree versus individual nestedness in the career networks. The insets show the analogous 
relationships obtained by replacing degree with strength s. Pitchers are given by blue dots, and batters are given by gray 
crosses. Pitchers with betweenness b « 2e~ 4 and low degree k tend to be position players who made a few pitching appearances 
(e.g., Keith Osik), pitchers with short careers (e.g., Wascar Serrano), or recent pitchers with few Major League appearances 
(e.g., John Van Beschoten, who has split time between the Major Leagues and the Minor Leagues since 2004). 



